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Abstract. This paper summarises the contributions to EuroCALL’s CorpusCALL 
SIG Symposium for the year 2020. In line with this year’s EuroCALL conference 
theme, ‘CALL for widening participation’, the Symposium centred around the 
theme of Data-driven learning for languages other than English. This paper gives 
a brief overview of developments and challenges when using Data-Driven Learning 
(DDL) to teach French, German, Italian, and Spanish. As research suggests, a DDL 
approach has been effectively utilised to teach these languages. However, there are 
differences in available DDL resources and corpora for the respective languages that 
are appropriate for language teaching. The main challenges for future developments 
are also discussed. 
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4. Introduction 


This paper shares developments in using DDL in teaching Languages Other Than 
English (LOTEs) within the wider DDL community. As literature on DDL has 
primarily focused on studies in the context of teaching English (Chambers, 2019), 
we provide brief overviews of the current state of DDL in relation to the teaching 
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of French, German, Italian, and Spanish. Each overview discusses challenges and 
proposes solutions to realising the full potential of the DDL approach. First, an 
empirical study of DDL for French is reported. Next, we provide a brief overview 
of the range and effectiveness of corpus resources used for teaching and learning 
German and indicate directions for future resource development and empirical 
research. We then trace a brief historical overview of DDL for Italian, with an 
indication of the main challenges that the field faces today. Finally, challenges of 
DDL for teachers and learners of Spanish are discussed. 


2. DDL for French: linking professional 
communication skills and linguistic features 


Research papers from the French DDL community mainly report on indirect 
applications (Vyatkina, 2020a), with learner corpora analysed as error repositories 
(Dubois, Kamber, & Dekens, 2013) or as resources for designing learning materials 
(Di Vito, 2013). Direct applications are mentioned within the context of academic 
writing (Jacques & Rinck, 2017) and French for specific purposes (Rodgers & 
Chambers, 2011). Here we present the results of a study focusing on the direct use 
of a small, specialised corpus by a group of 12 international engineering students 
enrolled on a professional writing course for advanced learners of French as a 
foreign language (target level: B2-C1l). The study aimed to determine whether 
guided observation of corpus data could help these students better understand 
recurrent language errors in their first drafts of technical specification documents, 
in French called ‘Cahier des Clauses Techniques Particuliéres’ (CCTP). We chose 
14 CCTP samples to create a corpus accessible via Sketch Engine (Kilgarriff et 
al., 2014). In this corpus, we identified linguistic features corresponding to the 
professional communication skills targeted (see Table |). The observed errors 
mainly correspond to these features. 


Table 1. Professional communication skills and linguistic features 


Professional communication skills Linguistic features 

Be neutral and objective Passive voice and 
noun|adjective|verb agreement 

Avoid mentioning agents Impersonal structures or pronouns 

Mention norms and standards Verbs (‘inform’,‘prescribe’, 
‘schedule’, ‘contain’) 

Describe or specify Demonstrative pronouns (celui, celle, 
ceux/celui-ci, celle-ci, ceux-ci, celles- 
ci) [the one, those/this one, these] 
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During the course, the participants completed worksheets containing activities 
partly inspired by their own errors, and they answered two online questionnaires. 
The data obtained inform about the learners’ progress and remaining needs. We 
conclude from this study that the specialised CCTP corpus offers enough data to 
support students who have to write a pedagogical version of a CCTP. However, 
more training time is needed to better explain to them the technical features of 
Sketch Engine. They also need to learn how to notice linguistic features and report 
their findings. 


To boost the DDL L2 French sector, we recommend choosing a user-friendly 
corpus tool and concentrating on learning issues. The content of the corpus must 
correspond to the writing task and the query activities should focus on the observed 
learning needs. 


3. DDL for German: available resources, 
learning outcomes, and future directions 


The subfield of DDL for German, like the broader DDL field, can be divided into 
pedagogical materials, classroom reports, and empirical research. The subfield’s 
origins go back to the turn of the 21st century (e.g. Dodd, 2000; St. John, 2001). 
In the most recent synthesis of DDL research, Boulton and Vyatkina (forthcoming) 
identify 14 empirical studies that explored the effectiveness of DDL for teaching 
German. Like most DDL research (ca. 90% of which has been dedicated to teaching 
English), studies on DDL for German primarily focus on university contexts and 
DDL interventions developed and administered by the researchers themselves. They 
report improved learner knowledge of German lexico-grammar and pragmatics as 
well as writing, translation, and interpreting skills and favourable learner attitudes. 
The geographic coverage of these studies is encouragingly broad, including seven 
countries and three continents, which attests to the generalizability of the findings. 
While more studies are needed in university contexts, promising future directions 
could also include an expansion of DDL for German to primary and secondary 
schools. 


A unique feature of the German subfield is the availability of several large, well- 
designed, sustainable, and open-access corpora. The missing link between these 
rich resources and a broader German-learning and German-teaching population 
is teacher and learner DDL guides, written in accessible language and tethered to 
specific corpora. One such guide to using the DWDS corpus (http://dwds.de) and 
associated DDL exercises currently are being developed and gradually released 
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with open access at the University of Kansas (Vyatkina, 2020b). It is hoped that 
other DDL researchers can use this resource as a model for “bringing corpora to the 
masses” (Boulton, 2011, p. 69) in DDL for German and beyond. 


4. DDL for Italian: studies, practices, 
and future prospects 


The studies on DDL for Italian cover a time span of at least 27 years. A solid 
starting point can be traced back to 1993, when Polezzi published her pioneering 
work in ReCALL. Polezzi (1993) showed how a corpus of Italian for specific 
purposes could be built and used with beginner learners of Italian enrolled in a 
postgraduate course in Renaissance Studies. She supported the idea of a didactic 
language corpus, identifying the characteristics that would make such a corpus 
suitable for specific language learning needs. 


Since then, the studies on DDL for Italian have risen steadily but not steeply. To 
the best of our knowledge, they are no more than 20 in total, consisting mostly of 
descriptive studies (e.g. Corino & Marello, 2009), and with still very few empirical 
studies (e.g. Forti, 2019). 


The pedagogical practices adopted in the context of DDL for Italian have been 
closely linked to the characteristics of available corpora. While freely accessible 
reference corpora of Italian are available, they were primarily built by researchers 
for researchers. As a result, their pedagogical potential is generally restricted to 
the development of paper-based materials and to advanced-level learners. The first 
learner-friendly corpus exploration tool for Italian was developed very recently, 
within the SkELL platform (Baisa & Suchomel, 2014). 


Bridging the teacher-researcher gap (Chambers, 2019) is one of the main 
challenges that DDL for Italian faces today. Integrating corpora in teacher training 
programmes, publishing teacher guides and developing more learner-friendly 
corpus exploration tools are ways to help bridge this gap. 


a: DDL for Spanish: attitudes and tasks 
in the use of corpora 


DDL did not have a name in Spanish until fairly recently. Two terms were coined 
(aprendizaje basado en datos and aprendizaje guiado por datos). The field adopted 
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the former, likely thanks to the seminal article by Asencion-Delaney et al. (2015), 
which reported the profusion of pedagogical articles and the shortage of empirical 
studies. Since then, the field has experienced a steady growth of empirical research 
in DDL with both native and learner corpora as sources (Benavides, 2015; Yao, 
2019). 


In terms of resources, there are vast open-access native corpora, such as Corpus 
del Espafiol (BYU) or CORPES XXI, and also important learner corpora (such 
as CAES, Aprescrilov, CEDEL 2). Among the numerous pedagogical articles, 
the scope of learning targets has widened from lexico-grammar to pragmatics, 
discourse features and pronunciation (using oral corpora), and varieties of Spanish. 
Corpus-based tasks can also be found in recently published textbooks (e.g. Aula 
Internacional 4, Prisma C2), which is helping to spread DDL among practitioners 
and learners. 


Despite this growth spurt, DDL is very far from being normalised in Spanish as a 
foreign language teaching practice. One main challenge lies in changing teachers’ 
attitudes towards corpus use by training programmes and by integrating corpus use 
in the syllabus. As in other LOTEs, most Spanish teachers do not seem to be aware 
of the benefits of using corpora in language teaching. In addition, there is a need 
for ready-made materials and “online corpus user guides for teachers and exercises 
integrated with specific corpora” (Vyatkina, 2020a, p. 364) that can inspire teachers 
to develop their own corpora. 


6. Conclusions 


This brief overview on DDL research for LOTEs revealed that DDL has effectively 
been used for teaching the languages considered. Challenges to DDL often centre 
around availability of appropriate corpora and tools for practitioners. The paper 
concentrated on a handful of European languages. Further reviews should explore 
developments of DDL within a wider geographical scope, including, for example, 
Arabic, Mandarin, and Russian. 
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